Back

Journal of the American Medical Informatics Association

53 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Automation in Clinical Trial Statistical Programming: A Structured Review of TLF Generation, Validation Frameworks, and AI/ML Integration
2025-12-29 health informatics 10.64898/2025.12.24.25342988
#1 (24.6%)
Show abstract

BackgroundClinical trial statistical programming is transitioning from manual, study-specific coding toward metadata-driven, automated pipelines. Although general data management transformation has been reviewed, comprehensive synthesis of statistical programming automation--particularly tables, listings, and figures (TLF) generation and validation frameworks--remains limited. This review addresses this gap through systematic evidence synthesis. MethodsWe conducted a structured literature revie...

2
Show Your Work: Verbatim Evidence Requirements and Automated Assessment for Large Language Models in Biomedical Text Processing
2026-03-04 health informatics 10.64898/2026.03.03.26346690
#1 (23.7%)
Show abstract

PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...

3
Optimizing an LLM-Based Clinical Data Querying System Using Metadata Enrichment and Task Decomposition
2025-12-23 health informatics 10.64898/2025.12.22.25342863
#1 (23.5%)
Show abstract

Accessing complex clinical registries traditionally requires SQL programming expertise, limiting data accessibility for non-technical researchers. In this paper, we designed and evaluated whether a text-to-SQL solution based on large language models (LLMs) could enable natural language querying of a real-world clinical registry under strict privacy and security constraints. Using self-hosted, open-source LLMs, we developed a multi-layered optimization framework incorporating metadata enrichment,...

4
Fully Automated Systematic Review Generation via Large Language Models: Quality Assessment and Implications for Scientific Publishing
2026-02-23 health informatics 10.64898/2026.02.18.26346559
#1 (23.2%)
Show abstract

Large language models (LLMs) are increasingly transforming scientific workflows, yet their application to rigorous evidence synthesis remains underexplored. Through the execution of a single Python script, we present a fully automated pipeline leveraging the Claude API to generate systematic reviews from literature search through manuscript completion without human intervention. Our pipeline processes hundreds of papers through iterative API calls for inclusion evaluation, information extraction...

5
Development and validation of an algorithm to identify front-line clinicians using EHR audit log data
2026-02-16 health informatics 10.64898/2026.02.13.26346268
#1 (23.1%)
Show abstract

BackgroundInterprofessional teams are central to high quality patient care. However, identifying the clinician primarily responsible for a patient requires labor-intensive methodologies. Although electronic health record (EHR) audit logs offer a scalable alternative, its use for identifying frontline clinicians is underdeveloped. ObjectiveTo develop and validate an algorithm utilizing EHR audit logs to identify the primary frontline clinician per patient day of an encounter and to describe care...

6
Validation of 13,102 ICD-10-CM Codes Using a Large Language Model-Based System
2025-12-31 health informatics 10.64898/2025.12.30.25343244
#1 (22.3%)
Show abstract

ObjectiveTo comprehensively evaluate the validity of ICD-10-CM codes for both prevalent diagnoses and less common diseases, and to assess the performance of a large language model (LLM)-based system in validating these codes. Materials and MethodsThis retrospective study analyzed hospital admissions from the Medical Information Mart for Intensive Care (MIMIC-IV) database. We developed a validated LLM-based system using GPT-4o, refined through iterative prompt engineering, to assess ICD-10-CM co...

7
Evaluating a Locally Deployed 20-Billion Parameter Large Language Model for Automated Abstract Screening in Systematic Reviews
2026-03-04 health informatics 10.64898/2026.03.04.26347506
#1 (22.2%)
Show abstract

BackgroundSystematic reviews (SRs) are essential for evidence-based medicine but require extensive time and resources for abstract screening. Large language models (LLMs) offer potential for automating this process, yet concerns about data privacy, intellectual property protection, and reproducibility limit the use of cloud-based solutions in research settings. ObjectiveTo evaluate the performance of a locally deployed 20-billion parameter LLM for automated abstract screening in systematic revi...

8
Clinicians' Rationale for Editing Ambient AI-Drafted Clinical Notes: Persistent Challenges and Implications for Improvement
2026-02-22 health informatics 10.64898/2026.02.20.26346729
#1 (22.2%)
Show abstract

Structured AbstractO_ST_ABSObjectiveC_ST_ABSThe use of ambient AI documentation tools is rapidly growing in US hospitals and clinics. Such tools generate the first draft of clinical notes from scribed patient-provider conversations, which clinicians can then review and edit before signing into electronic health records (EHR). Understanding how and why clinicians make modifications to AI-generated drafts is critical to improving AI design and clinical efficiency, yet it has been under-studied. Th...

9
Embedded point of care stratified block randomization: demonstration of the Point of Care Randomization (POCR) engine with an electronic health record pragmatic clinical trial
2026-01-28 health informatics 10.64898/2026.01.26.26344847
#1 (22.1%)
Show abstract

We describe a new custom feature within our Epic Systems electronic health record (EHR) that automates stratified randomization at the point-of-care or order. As a demonstration use-case, we conducted a randomized trial of a provider-facing alert for short-interval HbA1c orders. Over 3 months the alert dramatically reduced repeat orders. This transportable clinical informatics application transforms health systems ability to conduct pragmatic clinical trials and deliver clinical care within the ...

10
Using local and statewide Electronic Health Record data to evaluate the impact of telemedicine in Virginia
2026-01-09 health informatics 10.64898/2026.01.08.26343531
#1 (22.1%)
Show abstract

ObjectiveTo analyze the impact of telemedicine on emergency department (ED) utilization among University of Virginia (UVA) Health System patients, examining which patient characteristics predict reduced ED usage and whether telemedicine reduces ED utilization. Materials and MethodsWe used UVA Electronic Health Records and public datasets to establish clinical and contextual features including demographics, comorbidities, insurance status, and community characteristics. UVA patient data were lin...

11
Safety and Utility of an Agentic Large Language Model-Based Hospital Course Summarizer: A Prospective Real-World Pilot Study
2026-02-06 health informatics 10.64898/2026.02.05.26345607
#1 (22.0%)
Show abstract

ImportanceHigh-quality discharge summaries are essential for safe care transitions but contribute substantially to clinician documentation burden and burnout. While retrospective studies suggest large language models (LLMs) can generate clinical summaries of comparable quality to physicians, prospective data on their safety, utility, and impact on clinician well-being in real-world environments are lacking. ObjectiveTo evaluate the safety, utilization, and impact on clinician burden of MedAgent...

12
Phecoder: semantic retrieval for auditing and expanding ICD-based phenotypes in EHR biobanks
2026-01-11 health informatics 10.64898/2026.01.08.26343725
Top 0.1% (21.9%)
Show abstract

BackgroundElectronic health record (EHR)-based phenotyping underpins genome-wide association studies, yet current ICD-code phenotypes rely heavily on manually curated lists such as Phecodes. These definitions are labour-intensive to maintain, inherently subjective, and may omit clinically relevant diagnostic codes, reducing study power. Advances in text embedding models offer an opportunity to automate and standardize ICD-based phenotype construction. MethodsWe developed Phecoder, an ensemble o...

13
Boards-style benchmarks overestimate prior-chat bias in large language models: a factorial evaluation study
2026-02-14 health informatics 10.64898/2026.02.12.26346164
Top 0.1% (21.7%)
Show abstract

BackgroundLarge language models (LLMs) are increasingly piloted as chat interfaces for chart review and clinical decision support. Although leading models achieve and even exceed physician-level accuracy on exam-style benchmarks such as MedQA, recent perturbation studies show large drops in accuracy after small changes to prompts, distractor content, or answer format. Prior work has not systematically examined how these vulnerabilities unintentionally manifest in clinically realistic settings, i...

14
Sino-US-DrugQA: A Benchmark for Evaluating Large Language Models in Cross-Jurisdictional Pharmaceutical Regulation
2026-02-17 health informatics 10.64898/2026.02.13.26346236
Top 0.1% (21.6%)
Show abstract

Cross-jurisdictional pharmaceutical compliance requires comparative analysis of regulatory requirements across jurisdictions such as the US FDA and Chinas NMPA. Although large language models (LLMs) are increasingly explored for healthcare-related applications, their performance in cross-jurisdictional regulatory comparison has not been systematically characterized using dedicated benchmarks. This study introduces Sino-US-DrugQA, a bilingual benchmark dataset designed to evaluate LLM performance...

15
Personas Shift Clinical Action Thresholds in Large Language Models
2026-01-02 health informatics 10.64898/2026.01.01.26343302
Top 0.1% (21.4%)
Show abstract

Background and aimsClinical LLM deployment is shifting from feasibility to liability, while current guidance largely treats model behavior as a control problem. We tested whether decision-style system prompts shift clinical action thresholds when clinical facts are held constant, and whether these shifts are consistent across settings and models. MethodsWe defined nine physician personas by crossing three ethical orientations (duty-, care-, utilitarian) with three cognitive styles (intuitive, i...

16
Agentic Trial Emulation to Learn Health System-specific Drug Effects At Scale
2026-02-20 health informatics 10.64898/2026.02.19.26346539
Top 0.1% (21.2%)
Show abstract

ObjectiveElectronic Health Record (EHR)-based trial emulation can support translation of randomized clinical trial (RCT) evidence into practice, yet emulations often diverge from published RCT results. We hypothesized that these discrepancies are structured and learnable properties of a health systems data-generating process, and that autonomous agentic workflows can generate discrepancies at the scale required for cumulative learning. Materials and MethodsWe developed an agentic trial emulatio...

17
MedEvalArena: A Self-Generated, Peer-Judged Benchmark for Medical Reasoning
2026-01-29 health informatics 10.64898/2026.01.27.26344905
Top 0.2% (18.5%)
Show abstract

Large Language Models (LLMs) demonstrate strong performance at medical specialty board multiple-choice question (MCQ) answering, however, underperform in more complex medical reasoning scenarios. This gap indicates a need for improving both LLM medical reasoning and evaluation paradigms. We introduce MedEvalArena, a framework in which LLMs engage in a symmetric round-robin format. Each model generates challenging board-style medical MCQs, then serves in an ensemble LLM-as-judge bench to adjudica...

18
When AI Meets the FDA: An Evaluation of Large Language Models Performance in Regulatory and Clinical Trial Data Extraction, Synthesis, and Analysis
2025-12-27 health policy 10.64898/2025.12.22.25342875
Top 0.2% (18.5%)
Show abstract

IntroductionClinical and population decision-making relies on the systematic evaluation of extensive regulatory evidence. The FDA drug reviews provide detailed information on clinical trial design, enrollment criteria, sample size, randomization, comparators, endpoints, and indications. However, extracting these data is resource-intensive and time-consuming. Generative Artificial Intelligence large language models (LLMs) may accelerate the extraction and synthesis of such information. This study...

19
Can Large Language Models Reduce the Cost of Extracting Data from Electronic Health Records for Research?
2026-01-11 health informatics 10.64898/2026.01.09.26343792
Top 0.2% (18.4%)
Show abstract

ObjectiveMuch medical data is only available in unstructured electronic health records (EHR). These data can be obtained through manual (human) extraction or programmatic natural language processing (NLP) methods. We estimate that NLP only becomes economically competitive with manual extraction when there are ~6500 EHRs records. We have found that there is interest from clinicians and researchers in using NLP on projects with fewer records. We examine whether a large language model (LLM) can be ...

20
ClinAgent: A Five-Layer Architecture for Autonomous Clinical Trial Statistical Programming
2026-01-16 health informatics 10.64898/2026.01.09.26343542
Top 0.2% (18.1%)
Show abstract

Clinical trial statistical programming requires 12-24 FTE-months for a typical Phase 3 study, producing 100-500 tables, listings, and figures (TLFs) across 8-15 ADaM domains. Modern AI coding agents (Augment Code, Claude Code, Cline, Cursor) demonstrate remarkable reasoning capabilities but lack the domain-specific tools needed for clinical programming: they cannot read SAS datasets, parse ADaM specifications, analyze regulatory-relevant log issues, or generate CDISC-compliant code without exten...